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In tliis section, we explain Cohen's kappa coefficient, a 
measure of agreement, which appeared in the article titled, 'Impact 
of Clinical Performance Examination on Incoming Interns' 
Clinical Competency in Differential Diagnosis of Headache', by 
Park et al." published in March 2014. 

KAPPA, A MEASURE OF AGREEMENT 

Cohen's kappa coefficient is a statistical measure of inter- 
rater agreement or inter-instrument agreement for qualitative 
(categorical) items. It is generally thought to be a more robust 
measure than a simple percent agreement calculation since kappa 
takes into account agreement occurring by chance. A chance- 
corrected measure originally introduced by Scott/^ was extended 
by Cohen^' and has come to be known as Cohen's kappa. It 
comes from the notion that the observed cases of agreement 
include some cases for which the agreement was by chance alone. 

Let us assvune that there are two raters, who independently 
rate n subjects into one of two mutually exclusive and exhaustive 
nominal categories. Letp,^ be the proportion of subjects that are 
placed in the i,;th cell, i.e., assigned to the ith category by the first 
rater and to the;th category by the second rater = 1, 2). Also, 
letpi^. = +Pa denote the proportion of subjects placed in the 
ith row (i.e., the ith category by the first rater), and let p+y =Py 
denote the proportion of subjects placed in the;th column (i.e., 
the;th category by the second rater). Then the kappa coefficient is 

_ iPo-Pe) 

where p„ =Pn +P22 is the observed proportion of agreement 



andp^ =Pi+P+i + PuP-n is the proportion of agreement expected 
by chance. 

If there is complete agreement, k = 1. If observed agreement is 
greater than or equal to chance agreement, k > 0, and if observed 
agreement is less than chance agreement, k < 0. The minimum 
value of K depends on the marginal proportions. If they are such 
thatp^ = 0.5, then the minimum equals —1. Otherwise, the 
minimum is between —1 and 0. 

Example. Two doctors independently classified 100 people 
into one of two diagnostic categories, abnormal/ normal as 
follows. 

Table 1. Diagnoses on n = 100 people by two doctors 







Doctor B 








Abnormal 


Normal 


Doctor A 


Abnormal 


40 


10 




Normal 


20 


30 



Observed agreement (pj = (40 + 30) / 100 = 0.7 
Chance agreement (pj = (40 + 10) / 100 x (40 -H 20) / 

100 + (20 + 30) / 100 X (10 -H 30) / 100 

= 0.5 X 0.6 + 0.5 X 0.4 = 0.3 -hO.2 = 0.5 
K= (0.7 -0.5) /(I -0.5) = 0.4. 

However, we should not use kappa as a measure of agreement 
when all raters or devices cannot be treated symmetrically. 
When one of the sources of ratings may be viewed as superior 
or a standard, e.g., one rater is senior to the other or one medical 
device is more precise measuring instrument than the other, 
kappa may no longer be appropriate. 
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